Goto

Collaborating Authors

 Monroe County






Adaptive Conformal Prediction via Bayesian Uncertainty Weighting for Hierarchical Healthcare Data

Shahbazi, Marzieh Amiri, Baheri, Ali, Azadeh-Fard, Nasibeh

arXiv.org Machine Learning

Clinical decision-making demands uncertainty quantification that provides both distribution-free coverage guarantees and risk-adaptive precision, requirements that existing methods fail to jointly satisfy. We present a hybrid Bayesian-conformal framework that addresses this fundamental limitation in healthcare predictions. Our approach integrates Bayesian hierarchical random forests with group-aware con-formal calibration, using posterior uncertainties to weight conformity scores while maintaining rigorous coverage validity. Evaluated on 61,538 admissions across 3,793 U.S. hospitals and 4 regions, our method achieves target coverage (94.3% vs 95% target) with adaptive precision: 21% narrower intervals for low-uncertainty cases while appropriately widening for high-risk predictions. Critically, we demonstrate that well-calibrated Bayesian uncertainties alone severely under-cover (14.1%), highlighting the necessity of our hybrid approach. This framework enables risk-stratified clinical protocols, efficient resource planning for high-confidence predictions, and conservative allocation with enhanced oversight for uncertain cases, providing uncertainty-aware decision support across diverse healthcare settings.


Multiclass Boosting: Simple and Intuitive Weak Learning Criteria

Neural Information Processing Systems

We study a generalization of boosting to the multiclass setting. We introduce a weak learning condition for multiclass classification that captures the original notion of weak learnability as being "slightly better than random guessing".




Machine learning to optimize precision in the analysis of randomized trials: A journey in pre-specified, yet data-adaptive learning

Balzer, Laura B., van der Laan, Mark J., Petersen, Maya L.

arXiv.org Machine Learning

Covariate adjustment is an approach to improve the precision of trial analyses by adjusting for baseline variables that are prognostic of the primary endpoint. Motivated by the SEARCH Universal HIV Test-and-Treat Trial (2013-2017), we tell our story of developing, evaluating, and implementing a machine learning-based approach for covariate adjustment. We provide the rationale for as well as the practical concerns with such an approach for estimating marginal effects. Using schematics, we illustrate our procedure: targeted machine learning estimation (TMLE) with Adaptive Pre-specification. Briefly, sample-splitting is used to data-adaptively select the combination of estimators of the outcome regression (i.e., the conditional expectation of the outcome given the trial arm and covariates) and known propensity score (i.e., the conditional probability of being randomized to the intervention given the covariates) that minimizes the cross-validated variance estimate and, thereby, maximizes empirical efficiency. We discuss our approach for evaluating finite sample performance with parametric and plasmode simulations, pre-specifying the Statistical Analysis Plan, and unblinding in real-time on video conference with our colleagues from around the world. We present the results from applying our approach in the primary, pre-specified analysis of 8 recently published trials (2022-2024). We conclude with practical recommendations and an invitation to implement our approach in the primary analysis of your next trial.


Topology Identification and Inference over Graphs

Mateos, Gonzalo, Shen, Yanning, Giannakis, Georgios B., Swami, Ananthram

arXiv.org Machine Learning

Topology identification and inference of processes evolving over graphs arise in timely applications involving brain, transportation, financial, power, as well as social and information networks. This chapter provides an overview of graph topology identification and statistical inference methods for multidimensional relational data. Approaches for undirected links connecting graph nodes are outlined, going all the way from correlation metrics to covariance selection, and revealing ties with smooth signal priors. To account for directional (possibly causal) relations among nodal variables and address the limitations of linear time-invariant models in handling dynamic as well as nonlinear dependencies, a principled framework is surveyed to capture these complexities through judiciously selected kernels from a prescribed dictionary. Generalizations are also described via structural equations and vector autoregressions that can exploit attributes such as low rank, sparsity, acyclicity, and smoothness to model dynamic processes over possibly time-evolving topologies. It is argued that this approach supports both batch and online learning algorithms with convergence rate guarantees, is amenable to tensor (that is, multi-way array) formulations as well as decompositions that are well-suited for multidimensional network data, and can seamlessly leverage high-order statistical information.